15 research outputs found

    Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models

    Full text link
    The interpretation of complex high-dimensional data typically requires the use of dimensionality reduction techniques to extract explanatory low-dimensional representations. However, in many real-world problems these representations may not be sufficient to aid interpretation on their own, and it would be desirable to interpret the model in terms of the original features themselves. Our goal is to characterise how feature-level variation depends on latent low-dimensional representations, external covariates, and non-linear interactions between the two. In this paper, we propose to achieve this through a structured kernel decomposition in a hybrid Gaussian Process model which we call the Covariate Gaussian Process Latent Variable Model (c-GPLVM). We demonstrate the utility of our model on simulated examples and applications in disease progression modelling from high-dimensional gene expression data in the presence of additional phenotypes. In each setting we show how the c-GPLVM can extract low-dimensional structures from high-dimensional data sets whilst allowing a breakdown of feature-level variability that is not present in other commonly used dimensionality reduction approaches

    Pärilike tunnuste täpne prognoosimine DNA põhjal

    Get PDF
    Indiviidi DNA järjestuse põhjal on võimalik ennustada pärilikke tunnuseid, nagu juuksevärv, kehakaal ja erinevad haigusriskid. Käesolevas töös uurime fenotüüpide prognoosimist suure pärmipopulatsiooni näitel. Meie eesmärgiks on välja selgitada, kui täpselt on võimalik fenotüüpe prognoosida ainult genotüübi põhjal ning kuivõrd saab ennustustäpsust suurendada, kui on teada indiviidi teiste fenotüüpide mõõtmistulemused. Modelleerimisel võtame arvesse eksperimendidisainist tulenevaid indiviididevahelisi sõltuvusi. Töös võrdleme lineaarsete segamudelite, mitmemõõtmeliste lineaarsete segamudelite ning juhuslike segametsade prognoosivõimet. Meil õnnestub saavutada ennustustäpsus, mis ületab pärilikkusel põhinevad klassikalised piirid. Lisaks pakume prognoosimiseks välja uue meetodi, mis kombineerib lineaarsete segamudelite ning juhuslike segametsade tugevaid külgi. Näitame simuleeritud andmetel, et selle prognoositäpsus ületab alternatiivsete mudelite oma

    MDL-meetod diferentsiaalselt metüleeritud regioonide tuvastamiseks

    Get PDF
    Bioloogilist huvi pakub küsimus, millised tegurid reguleerivad geenide avaldumist. DNA metülatsioon on üks mitmetest mehhanismidest, mida rakkudes kasutatakse geenide vaigistamiseks. Metülatsioon omab funktsionaalset rolli ainult DNA järjestuse kindlatel positsioonidel, mida nimetatakse CpG saitideks. Tihti on järjestikuste CpG saitide metüleeritus sarnane, seega on mõttekas otsida ühesuguse metülatsioonimustriga pikemaid regioone. Diferentsiaalselt metüleeritud regioonideks (DMR) nimetatakse selliseid järjestikusi CpG saite, kus erinevate gruppide (näiteks vähihaigete ja tervete, noorte ja vanade indiviidide või erinevat tüüpi kudede) vahel on metüleerituses erinevusi. Käesoleva bakalaureusetöö eesmärgiks on välja töötada meetod diferentsiaalselt metüleeritud regioonide tuvastamiseks, mida saaks kasutada eelkõige metülatsioonikiibi andmetel. Selleks soovime jagada DNA järjestuse optimaalsel viisil segmentideks ning seejärel teha iga segmendi kohta otsuse, kas seal esineb diferentsiaalne metülatsioon või mitte. Töö algul antakse ülevaade DNA metülatsioonist ning formuleeritakse matemaatiline probleem, mis seisneb andmetest järjestikuste segmentide leidmises. Järgneb ülevaade tõenäosuslike mudelite kodeerimisest ning MDL-printsiibist, sest sellele toetume optimaalse segmentatsiooni leidmisel. Peatükis 3 on toodud üldine raamistik, mille kohaselt jagada andmed parimal viisil segmentideks, kasutades segmendiviisi defineeritud mudeleid ning valides neist MDLi mõttes parima. Selline raamistik võimaldab kasutada segmentidel andmete kirjeldamiseks suvalisi mudeleid, mille alusel on võimalik arvutada andmete tõepära. Seejärel on seda raamistikku kasutatud kahe konkreetse meetodi jaoks. Neist on lähemalt uuritud meetodit, mis põhineb segmentidele lineaarsete mudelite sobitamisel: testime seda nii simuleeritud kui ka bioloogilistel andmetel, lisaks võrdleme saadud tulemusi ühe võimaliku alternatiiviga. Need mõlemad meetodid implementeeriti programmeerimiskeeles R

    Bayesian statistics and modelling

    Get PDF
    Bayesian statistics is an approach to data analysis based on Bayes’ theorem, where available knowledge about parameters in a statistical model is updated with the information in observed data. The background knowledge is expressed as a prior distribution and combined with observational data in the form of a likelihood function to determine the posterior distribution. The posterior can also be used for making predictions about future events. This Primer describes the stages involved in Bayesian analysis, from specifying the prior and data models to deriving inference, model checking and refinement. We discuss the importance of prior and posterior predictive checking, selecting a proper technique for sampling from a posterior distribution, variational inference and variable selection. Examples of successful applications of Bayesian analysis across various research fields are provided, including in social sciences, ecology, genetics, medicine and more. We propose strategies for reproducibility and reporting standards, outlining an updated WAMBS (when to Worry and how to Avoid the Misuse of Bayesian Statistics) checklist. Finally, we outline the impact of Bayesian analysis on artificial intelligence, a major goal in the next decade

    Predicting quantitative traits from genome and phenome with near perfect accuracy.

    Get PDF
    In spite of decades of linkage and association studies and its potential impact on human health, reliable prediction of an individual's risk for heritable disease remains difficult. Large numbers of mapped loci do not explain substantial fractions of heritable variation, leaving an open question of whether accurate complex trait predictions can be achieved in practice. Here, we use a genome sequenced population of ∼7,000 yeast strains of high but varying relatedness, and predict growth traits from family information, effects of segregating genetic variants and growth in other environments with an average coefficient of determination R(2) of 0.91. This accuracy exceeds narrow-sense heritability, approaches limits imposed by measurement repeatability and is higher than achieved with a single assay in the laboratory. Our results prove that very accurate prediction of complex traits is possible, and suggest that additional data from families rather than reference cohorts may be more useful for this purpose

    Powerful decomposition of complex traits in a diploid model

    Get PDF
    Explaining trait differences between individuals is a core and challenging aim of life sciences. Here, we introduce a powerful framework for complete decomposition of trait variation into its underlying genetic causes in diploid model organisms. We sequence and systematically pair the recombinant gametes of two intercrossed natural genomes into an array of diploid hybrids with fully assembled and phased genomes, termed Phased Outbred Lines (POLs). We demonstrate the capacity of this approach by partitioning fitness traits of 6,642 Saccharomyces cerevisiae POLs across many environments, achieving near complete trait heritability and precisely estimating additive (73%), dominance (10%), second (7%) and third (1.7%) order epistasis components. We map quantitative trait loci (QTLs) and find nonadditive QTLs to outnumber (3:1) additive loci, dominant contributions to heterosis to outnumber overdominant, and extensive pleiotropy. The POL framework offers the most complete decomposition of diploid traits to date and can be adapted to most model organisms

    DNA methylome profiling of human tissues identifies global and tissue-specific methylation patterns

    Get PDF
    BACKGROUND: DNA epigenetic modifications, such as methylation, are important regulators of tissue differentiation, contributing to processes of both development and cancer. Profiling the tissue-specific DNA methylome patterns will provide novel insights into normal and pathogenic mechanisms, as well as help in future epigenetic therapies. In this study, 17 somatic tissues from four autopsied humans were subjected to functional genome analysis using the Illumina Infinium HumanMethylation450 BeadChip, covering 486 428 CpG sites. RESULTS: Only 2% of the CpGs analyzed are hypermethylated in all 17 tissue specimens; these permanently methylated CpG sites are located predominantly in gene-body regions. In contrast, 15% of the CpGs are hypomethylated in all specimens and are primarily located in regions proximal to transcription start sites. A vast number of tissue-specific differentially methylated regions are identified and considered likely mediators of tissue-specific gene regulatory mechanisms since the hypomethylated regions are closely related to known functions of the corresponding tissue. Finally, a clear inverse correlation is observed between promoter methylation within CpG islands and gene expression data obtained from publicly available databases. CONCLUSIONS: This genome-wide methylation profiling study identified tissue-specific differentially methylated regions in 17 human somatic tissues. Many of the genes corresponding to these differentially methylated regions contribute to tissue-specific functions. Future studies may use these data as a reference to identify markers of perturbed differentiation and disease-related pathogenic mechanisms
    corecore